Conversation
genzgd
left a comment
There was a problem hiding this comment.
This seems okay to me, although I can't claim to have done anything resembling a full review. A couple observations:
- I'm curious as to where the improvements come from over the existing implementation, so I'm looking forward to that blog post.
- There's a lot of duplicated code in the aiohttp_client. It would be nice to consolidate that somewhere.
- The piece with the async queue is hard to follow -- I don't know out feasible it is, but it would be nice to remove that layer and just use some kind of async based generator without wrapping the extra queue.
|
Thanks @genzgd. To address your questions:
|
Yes, as I think about it, that makes sense. It might be theoretically possible to run the sync HTTP client (and the buffer) in a separate thread than the parser, gaining a similar benefit. On a related note, making the transform step truly parallel would be challenging given the fact that HTTP chunks won't align with Native format blocks, but that's another argument in favor of a TCP protocol client. :) |
|
For those interested, I have published a RC off this branch for testing and feedback: https://github.com/ClickHouse/clickhouse-connect/releases/tag/v0.12.0rc1 |
this is so great! we will be trying this in our staging environment and report back. |
|
Hi, I have been testing v0.12rc and I got an interesting improvement with Opentelemetry Context propagation for Async Client. This is somewhat related to #303 :
|
|
Has largely been working well for me. I've noticed some intermittent server disconnect issues, but I suspect I'm the cause of these somehow. |
|
Is this included in 0.15.0? |
|
Hi @thewhaleking, no, I cut 0.15.0 as kinda like the last release before the official roll to 1.0.0. I'll have a 1.0.0rc1 out sometime in the next few week or so that will include this. I did release 0.12.0rc1 w while back which you can grab from pypi if you wanted to try this out though. |
Summary
Replaces the old executor-based
AsyncClient, which wrapped the syncHttpClientin aThreadPoolExecutor, with a native async implementation built onaiohttp. The public API surface is unchanged:clickhouse_connect.get_async_client()returns anAsyncClientwith the same methods. The difference is entirely under the hood, where real async I/O replaces thread-pool delegation.Why this change
The previous
AsyncClientran every operation in a thread pool vialoop.run_in_executor(). This:The new implementation performs HTTP I/O natively with
aiohttp, giving real concurrency benefits for async workloads.Design
Native async I/O
Requests use
aiohttp.ClientSessionwith a configurableTCPConnector(pool limits, keepalive). HTTP response handling is fully async.aiohttpis an optional dependency installed viapip install clickhouse-connect[async].Streaming bridge for ClickHouse Native format
Native format parsing and serialization is synchronous CPU-bound work. The client uses a bounded queue in
AsyncSyncQueueas a sync/async bridge so async network reads/writes can overlap with sync parsing/serialization in an executor.On the query path in
StreamingResponseSource, the async producer reads from theaiohttpresponse and the sync consumer parses in an executor. On the insert path inStreamingInsertSource, the sync producer serializes in an executor and the async consumer streams toaiohttp.Event loop safety
Non-streaming queries methods like
.query(),.query_df(), etc. are fully materialized inside the executor before returning. By the time aQueryResultis returned, all data is in memory, so synchronous iteration won't block the loop.Streaming queries like
.query_rows_stream(),.query_df_stream(), etc. detect synchronous iteration from within an async context and raiseProgrammingErrorimmediately, prompting the caller to useasync forinstead.Lazy imports
aiohttpis imported lazily soimport clickhouse_connectworks without it installed. Attempting to use the async client without aiohttp raises a clearImportErrorwith install instructions. Heavy optional dependencies (numpy, pandas, pyarrow, polars) are also lazily loaded, matching the sync client.Breaking changes
AsyncClient(client=sync_client)no longer works. Useget_async_client()orcreate_async_client().executor_threadsandexecutorparameters have been removed fromcreate_async_client().pool_mgris rejected on the async path with a message pointing toconnector_limit/connector_limit_per_host.clickhouse_connect.driver.aiohttp_clientno longer exists.AsyncClientis importable fromclickhouse_connect.driveras before.Migration
The async client API is otherwise identical. All query, insert, and streaming methods have the same signatures.
Tests
The full integration test suite runs parametrized across both sync and async clients. Dedicated async tests in
test_async_features.pycover concurrency, streaming cleanup, session protection, timeouts, and error isolation.Performance
Benchmarks comparing the old executor-based client against the native async client showed speedups ranging from parity to 75% depending on workload, with an geometric average improvement around 16% across a wide range of realistic workloads. P95 latencies also improved significantly.
Trade-offs
.query(),.query_df(), etc. are fully materialized in the executor before returning to the caller, which is the expected behavior for those APIs. Streaming variants like.query_rows_stream(), etc. are available for incremental processing.Checklist